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Abstract 

We provide a necessary and sufficient condition under which a con- 
vex set is approachable in a game with partial monitoring, i.e. where 
players do not observe their opponents' moves but receive random sig- 
nals. This condition is an extension of Blackwell's Criterion in the full 
monitoring framework, where players observe at least their payoffs. 
When our condition is fulfilled, we construct explicitly an approacha- 
bility strategy, derived from a strategy satisfying some internal consis- 
tency property in an auxiliary game. 

We also provide an example of a convex set, that is neither (weakly)- 
approachable nor (weakly)-excludable, a situation that cannot occur 
in the full monitoring case. 

We finally apply our result to describe an e-optimal strategy of 
the uninformed player in a zero-sum repeated game with incomplete 
information on one side. 

Key Words : Repeated Games, Blackwell Approachability, Partial 
Monitoring, Convex Sets, Incomplete Information 

Introduction 



Blackwell [1] introduced the notion of approachability in two-person (in- 
finitely) repeated games with vector payoffs in some Euclidian space M. d , as 
an analogue of Von Neumann's minmax theorem. A player can approach a 
given set E C if he can insure that, after some stage and with a great 
probability, the average payoff will always remain close to E. Blackwell [4 J 
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proved that if both players observe their payoffs and E satisfies some geo- 
metric condition (E is then called a B-set), then Player 1 can approach it. 
He also deduced that given a convex set C either Player 1 can approach it 
or Player 2 can exclude it, i.e. the latter can approach the complement 
of a neighborhood of C. As Soulaimani, Quincampoix & Sorin [T] have 
recently proved that the notions of -B-set (in a given repeated game) and 
discriminating domains (for a suitably chosen differential game) coincide. 

We consider the partial monitoring framework, where players do not 
observe their opponent's moves but receive random signals. We provide in 
section 11.21 a necessary and sufficient condition under which a convex set is 
approachable. We also construct an approachability strategy derived from 
the construction (following Perchet [10]) of a strategy that has no internal 
regret (internal consistency in this framework has been defined by Lehrer & 
Solan [8j, Definition 9). 

Three classical results that hold in the full monitoring case do not extend 
to the partial monitoring framework. Indeed, in a specific game introduced 
in section 13.21 there exists a convex set C that is neither approachable by 
Player 1 nor excludable by Player 2 (see Theorem 3 in Blackwell [1]). More- 
over, C is not approachable by Player 1 while every half-space that contains 
it is approachable by Plater 1 (see Corollary 2 in Blackwell [4]). Finally, C 
is neither weakly-approachable nor weakly-excludable (see Vieille [12]). We 
recall that weak-approachability is a weaker notion than approachability, 
also introduced by Blackwell [4 J , in finitely repeated games (see Definition 
11.21 in section [T]). 

Kohlberg [7] used the notion of approachability in order to construct 
optimal strategies of the uninformed player, in the class of zero-sum repeated 
games with incomplete information on one side (introduced by Aumann &; 
Maschler [2]). Our result can be used in this framework to provide a simple 
proof of the existence of a value in the infinitely repeated game through the 
construction of an e-optimal strategy of Player 2. 

1 Approachability 

Consider a two-person game V repeated in discrete time. At stage n G N, 
Player 1 (resp. Player 2) chooses an action i n G I (resp. j n G J), where both 
sets / and J are finite. This generates a vector payoff p n = p(i n ,jn) G 
where p is a mapping from / x J to M. d . Player 1 does not observe j n nor 
p n but receives a random signal s n G S whose law is s(i n ,j n ) where s is a 
mapping from I x J to A (S) (the set of probabilities over the finite set S). 
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Player 2 observes i n , j n and s n . The choices of i n and j n depend only on 
the past observations of the players and may be random. 

Explicitly, a strategy a of Player 1 is a mapping from H 1 to A (J) where 
H 1 = IJneN (I x S) n * s ^ rie Se ^ °f fi n ite histories available to Player 1. After 
the finite history h\ G (I x S 1 )™, <x(/i„) G A (J) is the law of i n +i- Similarly, a 
strategy r of Player 2 is a mapping from H 2 = IJneN (I x S x J) n to A( J). 
A couple of strategies (<r, r) generates a probability, denoted by P<t,t, over 
% = (I x S x J) , the set of plays embedded with the cylinder cr-field. 

The two functions p and s are extended multilinear ly to A (I) x A (J) by 
p(x,y) =E x , y \p(i,j)} eR d and s(x,y)=E x>y [s(i,j)} G A(5). 

The following notations will be used: for any sequence a = {a m G 
K d }meN, the average of a up to stage n is denoted by a n := X)m=i a m/ n ana ^ 
for any set E C M d , the distance to E is denoted by (Ie(z) := inf eG £ ||z — e||, 
where || • || is the Euclidian norm. 

Definition 1.1 (Blackwell [4]) i) A closed set E C M d is approachable 
by Player 1 if for every e > 0, there exist a strategy a of Player 1 and 
N G N such that for every strategy r of Player 2 and every n> N : 

E CT , r [d E (p n )} < e and P CTjT sup d E (p n ) >e) <e. 

\n>N J 

Such a strategy a e is called an e-approachability strategy of E. 

ii) A set E is excludable by Player 2, if there exists d > such that 
the complement of E s is approachable by Player 2, where E s = {z G 
M. d ;d E (z)<8}. 

In words, a set E C M. d is approachable by Player 1, if he can insure that 
the average payoff converges almost surely to E, uniformly with respect to 
the strategies of Player 2. Obviously, a set E cannot be both approachable 
by Player 1 and excludable by Player 2. 

Definition 1.2 i) A closed set E is weakly- approachable by Player 1 if 
for every e > 0, there exists JVsN such that for every n > N , there is 
some strategy a n of Player 1 such that for every strategy t of Player 2: 

E CT „,r [dE(p n )\ < E. 

ii) A set E is weakly- excludable by Player 2, if there exists 5 > such that 
the complement of E 5 is weakly- approachable by Player 2. 
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We emphasize the fact that in the definition of weak-approachability, the 
strategy of Player 1 might depend on n, the length of the game, which was 
not the case in the definition of approachability. 

1.1 Pull monitoring case 

A game satisfies full monitoring if Player 1 observes the moves of Player 2, 
thus if S = J and s(i,j) = j. Blackwell [1] gave a sufficient geometric 
condition under which a closed set E is approachable by Player 1. He 
also provided a full characterization for convex sets. Stating his condition 
requires the following notations: He(z) = {e G E;cIe(z) = \\z — e||} is the 
set of closest points to z G U. d in E, and P l (x) = {p(x,y);y G A(J)} (resp. 
P 2 (y) = {p(x,y);x G A(I)}) is the set of expected payoffs compatible with 
x G A(I) (resp. y G A(J)). 

Definition 1.3 A closed set E o/M d is a B-set, if for every z G M. d , there 
exist p G He(z) and x (= x{z)) G A(/) such that the hyperplane through p 
and perpendicular to z — p separates z from P 1 (x), or formally: 

Vz G R d , 3p G U E {z), 3x G A(J), (p(x, y)-p,z-p)<0, Vy G A(J). (1) 

Condition (TjQ) and therefore Theorem 11.41 do not require that Player 1 
observes Player 2's moves, but only his own payoffs (which was Blackwell's 
assumption) . 

Theorem 1.4 (Blackwell |4]) A B-set E is approachable by Player 1. 

Moreover, consider the strategy a of Player 1 defined by a(h n ) = x(j) n ). 
Then for every strategy r of Player 2 and every r/ > 0: 

AB ( \ SB 

KAdUpn)} < — and P CT , r sup d E (p n ) > n )< (2) 

with B = sup^ \\p(i, j)\\ 2 . 

For a closed convex set C, a full characterization is available: 

Corollary 1.5 (Blackwell |4j) A closed convex set C C M. d is approach- 
able by Player 1 if and only if: 

P 2 (y)nC/0, VyGA(J). (3) 
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Using a minmax argument, Blackwell [4J proved that condition ([3]) implies 
condition ([1]) , therefore the -B-set C is approachable by Player 1 . This char- 
acterization implies the following properties on convex sets: 

Corollary 1.6 (Blackwell |4j) 1. A closed convex set C is either ap- 
proachable by Player 1 or excludable by Player 2. 

2. A closed convex set C is approachable by Player 1 if and only if every 
half-space that contains C is approachable by Player 1. 

If condition ([3]) is not fulfilled for some yo G A (J), then (by the law of large 
numbers) Player 2 just has to play accordingly to yo at each stage to exclude 
C . If every half-space that contains C is approachable, then C is a B-set. 
Conversely any set that contains an approachable set is approachable. 

Blackwell also conjectured the following result on weak-approachability, 
proved by Vieille: 

Theorem 1.7 (Vieille |12j) A closed set is either weakly- approachable by 
Player 1 or weakly- excludable by Player 2. 

Vieille |12j constructed a differential game T> (in continuous time and with 
finite length) such that the finite repetitions of T can be seen as a discretiza- 
tion of T>. The existence of the value for D implies the result. 

1.2 Partial monitoring case 

The main objective of this section is to provide a simple necessary and 
sufficient condition under which a convex set C is approachable in the partial 
monitoring case. 

Before stating it, we introduce the following notations: the vector of 
probabilities over S defined by s(y) = (s(i,y))j 6 j G A(S) 1 is called the flag 
generated by y G A (J). This flag is not observed by Player 1 since if he 
plays i G / he only observes a signal s which is the realization of the i-th 
component of s(y). However, it is theoretically the maximal information 
available to him about y G A(J). Indeed, Player 1 will never be able to 
distinguish between any two mixed action y and y' that generate the same 
flag, i.e. such that s(y) = s(y'). 

Given a flag /x in S, the range of s, s _1 (//) = {y G A(J); s(y) = fi} is the 
set of mixed actions of Player 2 compatible with fi. P(x,fi) = {p(x,y);y G 
s_1 (/ u )} i s the set of expected payoffs compatible with x G A(I) and fi G S. 

Our main result is: 
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Theorem 1.8 A closed convex set C C M. d is approachable by Player 1 if 
and only if: 

Vyu G5,3xG A(I),P(x, h) C C. (4) 

P(x, •) can be extended to A(5) 7 (without changing condition (|3J)) by defin- 
ing, for every fj, £ S, either P(x,/u) = or P(x,fi) = P(x,Us(fi)), where 
IIs(-) is the projection onto S. 

In the full monitoring case, condition Q is exactly condition ([3]). In- 
deed, if Player 1 observes Player 2's action then S = J, S = {(y, . . . ,y) G 
A( J) 1 ; y G A( J)} and given y = (y, . . . , y) G S, P(x, y) = {p(x, y)}. Con- 
dition (J4j) implies that for every y G A (J) there exists x G A (J) such that 
p{x,y) G C, or equivalently P 2 (y) 

An other important result is that Corollary 11.61 and Theorem 11.71 do not 
extend: 

Proposition 1.9 1. There exists a closed convex set that is neither ap- 
proachable by Player 1 nor excludable by Player 2 

2. An half-space is either approachable by Player 1 or excludable by Player 2 

3. There exists a closed convex set that is not approachable by Player 1 
while every half-space that contains it is approachable by Player 1 

4- There exists a closed convex set that is neither weakly- approachable by 
Player 1 nor weakly- excludable by Player 2. 

As said in the introduction, the proof of Theorem 11.81 relies on the con- 
struction of a strategy that has no internal regret in an auxiliary game with 
partial monitoring. 

2 Internal regret with partial monitoring 

Consider the following two-person repeated game Q with partial monitoring. 
At stage n 6 N, we denote by x n G A(J) and y n G A(J) the mixed actions 
chosen by Player 1 and Player 2 (i.e. the laws of i n and j n ). As before, 
we denote by s n the signal observed by Player 1, whose law is the i n -th 
coordinate of p n = s(j n ). 

Although payoffs are unobserved, given a flag /i G A(S') 7 and x G A(7), 
Player 1 evaluates his payoff through G(x, p) where G is a continuous map 
from A(/) x A(S) 1 to R, not necessarily linear. 
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In the full monitoring framework, Foster & Vohra [5] defined internally 
consistent strategies (or strategies that have no internal regret) as follows: 
Player 1 has asymptotically no internal regret if for every i E I, either the 
action i is a best response to his opponent's empirical distribution of actions 
on the set of stages where he actually played i, or the density of this set 
(also called the frequency of the action i) converges to zero. 

In our framework, G is not linear so every action i £ I (or the Dirac mass 
on i) might never be a best response; best responses are indeed elements of 
A(/). Thus if we want to define internal regret, we cannot distinguish the 
stages as a function of the actions actually played (i.e. i n G I) but as a 
function of the laws of the actions (i.e. x n G A(/)). 

We consider strategies described as follows: at stage n Player 1 chooses 
(at random) a law x(l n ) in a finite set {x{l) G A(I);Z G L] and given that 
choice, i n is drawn accordingly to x(l n ); l n is called the type of the stage n. 

We denote by N n (l) = {1 < m < n; l m = 1} the set of stages (before 
the n-th) of type I and for any sequence a = {a m G M d } mg N> o n (l) = 
J2meN n (l) a m/\N n {l)\ is the average of a on N n (l). 

Definition 2.1 For every n G N and every I G L, the internal regret of type 
I G L at stage n is 



where Ji n (l) is the unobserved average flag on N n (l). 

A strategy a of Player 1 is (L,e) -internally consistent if for every strategy 
t of Player 2: 



The set L is assumed to be finite, otherwise there would exist trivial strate- 
gies such that the frequency of every x(l) converges to zero. In words, if a is 
an (L, e)-internally consistent strategy then either x(l) is an e-best response 
to ~p n (l), the unobserved average flag on N n (l), or this set has a very small 
density. 

Theorem 2.2 (Lehrer &; Solan|8j; Perchet |10] ) For every e > 0, there 
exist a finite set L and a (L,e) -internally consistent strategy a such that for 
every strategy r of Player 2: 



n n (i) 



sup [G(x, n n (l)) - G{x(l),n n (l))] 



xeA(i) 




E CT , r SUp 
_l€L 



N n (l)\ 



n 



n n {i)-e 




) 



and 
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Vr/>0,P ff)T (3n > NJ e L,^^(n n (l) - e^j > rA < O (-^j . 
3 Proofs of the main results 

This section is devoted to the proofs of the theorems stated in the previous 
section. 



3.1 Proof of Theorem ITSl 

Let C be a convex set such that for every p G A(S) 1 there exists G 
A(I) such that P(x fl ,p) C C. Given e > 0, we are going to construct 
an e-approachability strategy in T based on an (L, e)-internally consistent 
strategy in some auxiliary game Q, where the evaluation function G is defined 
by: 

G(x,fj,) = - sup d c (p(x,y)) 

if p 6 S. If fj, £ S, then G(x,p) = G (x,Us(fi)) where Us is the projection 
onto 5. 

Sufficiency: Any strategy in the auxiliary game Q naturally defines a 
strategy in the original game T. The main idea of the proof is quite simple: 
given e > 0, consider the finite family {x(l);l £ L} and the (L, e)-internally 
consistent strategy a of Player 1 given by Theorem l2.21 Then for every / € L, 
either \N n {l)\/n is very small, or lZ n (l) < e. In that last case, the definition 
of G implies that ~p n (l) is e-close to C. Since 



E 



ii 
l&L 

~p n is a convex combination of terms that are e-close to C. Since C is convex, 
~p n is also close to C. 

Formally, let a be a (L, e)-internally consistent strategy of Player 1 given 
by Theorem 12.21 For every 6 > 0, there exists iV 1 G N such that for any 
strategy r of Player 2: 

\/n>N\sup^^(n n (l)-e^<9^>l-e. (6) 

Recall that for any [i G A(S*) J there exists x^ G such that P(x ll ,fi) C 
C, therefore sup^^^ G{z,n) = G(x^,p) = and 



Tl n (l)= sup d c (p(x(l),y)) >d c [p{x(l),j n (l)) 
3/es-i(/j n (z)) 



S 



because s(j n (l)) = p n (l) by linearity of s. 

The random variables l n and j n are independent (given the finite histo- 
ries) and so are i n and j n given l n . Thus Hoeffding-Azuma [3], [6] 's inequal- 
ity for sums of bounded martingale differences implies that p (x(l), j n (l)) is 
asymptotically close to p n (l)- Explicitly, for every 9 > 0, there exists TV 2 G N 
(independent of a and r) such that: 

y n > N^3l G L, l -^^\p n (l) - p(x(l),J n (l))\ <9^j >l-9. (7) 

Equations © and ([7]) imply that for every n > N = max{A^ 1 ,iV 2 } and 
every I G L, with probability at least 1 — 29: 



^(doM))-e)<20. 
n 



Since C is a convex set, dc{ ) is convex, thus for any strategy r of 
Player 2, with P CT)T -probability at least 1 — 29, for every n > N: 

dc(p n ) < E ^^^(Pn(0) < + e, 

and C is approachable by Player 1. 

Necessity: Conversely, assume that there exists po G A(S) 1 such that 
for all x G A(J), there is some y(= y(x)) G s _1 (^o) such that dc (p(x,y)) > 
0. Since A(7) is compact, we can assume that there exists 5 > such that 

(p(x,y{x))) > 5. 

Let 7o be the subset of strategies of Player 2 that generate at any stage 
the same flag po (explicitly, a strategy r belongs to To if for every finite 
history /i 2 , r(/i 2 ) G s _1 (/Uo)). Recall that a strategy a of Player 1 depends 
only on his past actions and on the signals he received. Since at any stage, 
two strategies r and r' in 7o induce the same laws of signals, the couples 
(a, r) and (a, t') generate the same probability on the infinite sequences of 
moves of Player 1. Therefore E CT)T [i n ] = E CTiT ' [i n ] := x n is independent of r. 

For every n G N, define the strategy r n in 7o by r n {h) = y(x n ), for all 
finite history h. Since dc{') is convex, by Jensen's inequality 

^a,r n [d c (p n )]>d c (E^JpJ). 

Since j m is independent of the history h m -\: 

)|/i m _i] =E CTjTn [p(i m ,y(x n ))\h 

m— 1 
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hence by linearity of p(-,y(x n )) 



E<T,T n [p(im,jm)\h m -l] = p(^a,T n [im\h m -l] ,y(Xn)) ■ 



Therefore E CT)7 - n [p n ] = p(x n ,y(x n )). Consequently 

E CT ,r n [dc (p n )} > d C (E CT , Tn [p n ]) = d c (p{x n ,y(x n ))) > 5 

and for any strategy a of Player 1 and any stage n £ N, Player 2 has a 
strategy such that the expected average payoff is at a distance greater than 
5 > from C. Thus C is not approachable by Player 1. 

Remark 3.1 The fact that C is a convex set is crucial in both parts of the 
proof. In the sufficient part, it would otherwise be possible that ~p n {l) € C 
for every I E L, while ~p n ^ C . In the necessary part, the counterpart could 
happen: dc (E [p n ]) > 5 while E [dc(j> n )\ = 0. 

Remark 3.2 The e-approachability strategy constructed relies on a (L,e)- 
internally consistent strategy, so one can easily show that: 



Corollary 3.3 There exists a a strategy of Player 1 such that for every 
n > ; there exists N £ N such that for every strategy r of Player 2 and 



The proof is rather classical and relies on a careful concatenation of e-- 
approachability strategies (where the sequence (efc)fceN decreases towards 0) 
called doubling trick (see e.g. Sorin [II] . Proposition 3.2). It is therefore 
omitted. 

3.2 Proof of Proposition 11.91 

In the proof of Theorem 11.81 we have shown that if a convex set is not 
approachable by Player 1 then for any of his strategy and any n G N, Player 2 
has a strategy r ra such that ~p n is at, at least, 5 from C. It does not imply 
that C is excludable by Player 2; indeed this would require that r n does not 
depend on a nor n. The proof of Proposition 11.91 relies mainly on the study 
of the following example. 




and 




n>N, E CT , r [d c (p n )} < V- 
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Proof of Proposition 11.91 Consider the following matrix two-person 
repeated game where Player 1 (the row player) receives no signal and his 

L R 



one-dimensional payoffs are defined by : T 






1 


-1 






C := [0;l/2] is neither approachable nor excludable: The closed 
convex set C := [0; 1/2] is obviously not approachable by Player 1 (otherwise 
Theorem 11.81 implies that there exists x 6 A(/) such that p(x,y) £ [0, 1/2] 
for every y € A (J)). More precisely, given a strategy a of Player 1, we define 
T n as follows: if x n (the expected frequency of T up to stage n 6 N — it 
does not depend on Player 2's strategy) is smaller than 1/4, then r n is the 
strategy that always plays L, otherwise that always plays R. Then the law 
of large numbers implies that, for n big enough, JEo-,r n [dc (Pn)] ^ s arbitrarily 
close to 1/4. 

It remains to show that Player 2 cannot exclude C. We prove this by 
constructing a strategy a of Player 1 such that the average payoff is infinitely 
often close to 0: a is played in blocks and the length of the p-th block is 
p2p+i Q n QC ^ bk>cks, Player 1 plays T while on even blocks he plays B. At 
the end of the block p, the average payoff is at most 1/p if it is an odd block 
and at least —1/p otherwise. Hence on two consecutive blocks (the p-th and 
the p + 1-th) there is at least one stage such that the average payoff is at 
a distance smaller than 1/p to {0}. Therefore {0} and C (since it contains 
{0}) is not excludable by Player 2. 

An half-space is either approachable by Player 1 or excludable 
by Player 2: Let E be an half-space not approachable by Player 1. Then 
there exists //o £ A(S) 1 such that, for every x £ A(I), P(x,po) <£_ E. This 
implies that there exists 5 > such that inf xg A(/) sup^ es -i( M() -) cIe (p(x, y)) > 
5 > and therefore for every x G A(7), there exists y G A (J) such that 
p(x, y) is in the complement of E s which is convex, since E is an half- 
space. BlackwelPs result applies for Player 2 (since we assumed he has full 
monitoring), so he can approach the complement of E s and exclude E. 

C is not approachable by Player 1 while every half-space that 
contains it is: An half-space that contains C contains either (— oo,0] or 
[0, +oo) which are approachable by, respectively, always playing T or always 
playing B. 

C is neither weakly-approachable by Player 1 nor weakly ex- 
cludable by Player 2 : we proved that for every strategy a of Player 1 and 
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every n 6 N big enough, Player 2 has a strategy r n such that E - iTn \dc{p n )] = 
1/2. Hence C is not weakly approachable. 

Conversely let r be a strategy of Player 2 in the game repeated 2n times 
(where n is large enough) and M 6 N be any integer. Consider the strategy 
a of Player 1 that consists in playing T during the first n stages. Since p n , the 
average payoff after those n stages, belongs to [0; 1], there exists an integer 
k\ G {1, . . . ,M} such that ~p n belongs to [^p-; j^] with P^-probability at 
least jj. Note that, given r , Player 1 can compute this k. 

Assume that, from stage n + 1 on, the strategy a dictates to play i.i.d 
action B with probability ^ and action T with probability 1 — . If n is 
large enough, the probability that the average payoff between stages n + 1 
and 2n belongs to [— ^ — jj ; 1 — ^ + jj\ is close to one (say bigger than 1/2, 
this is again a direct consequence of the law of large number). Therefore, 
this strategy a ensures that with P^-probability at least * ne avera g e 
payoff over the 2n stages belongs to + 2jj]- 

Denote by (C 2 / M ) c the complement of the -^-neighborhood of C. Given 
a strategy r of Player 2 and an integer n big enough, the strategy a we 
described ensures that E CT)T d^ c2 / M y {j>2n) ^ TW 1 - Therefore, for every 

M G N, (C 2//M ) c is not weakly-approachable by Player 2 hence C is not 
weakly-excludable. 

The strategy a we described can be easily made independent of r by, 
for example, choosing k\ G {1, . . . , M} at random; indeed, this would imply 

that E CTjT d^ C 2 /M y (P2n 



These results hold if one chooses C3 := [0;l/3] instead of [0;l/2]. In 
fact, it only remains to prove that C3 is not weakly-excludable by Player 2. 
Consider the game repeated 3n times and the strategy a, defined by block 
of size n, that plays on the first block always T, on the second block i.i.d. 
action B with probability jj. The average payoff on those two block belongs 
to a small neighborhood of [0; 1/2], hence to some [^p, jj] (where k<i < 4^) 
with probability at least -p. Assume that on the third block Player 1 plays 
i.i.d action B with probability ^ then the average payoff over the three 
blocks belongs to a small neighborhood of [0; 1/3] with probability at least 
■ Therefore C3 is not weakly excludable. 

Since this proof can be generalized to any set Ck = [0; |], even the sin- 
gleton {0} is neither weakly-approachable nor weakly-excludable; we recall 



12 



that in the full monitoring framework all those convex sets are approachable 
by Player 1. 

3.3 Remarks on the counterexample 

Following Mertens, Sorin & Zamir's notations [9] (see Definition 1.2 p. 149), 
Player 1 can guarantee v in a zero-sum repeated game Too if 

Ve > 0,3a £ ,3N £ N 5 E CTejT [p n ] > v - e,Vr,Vn > N, 

where a e is a strategy of Player 1, and r any strategy of Player 2. Player 2 
can defend v if: 

Ve > 0, Vct £ , 3t, 3N <g N, E CTe , T [p n ] < v + e, Vn > N. 

If Player 1 can guarantee v and Player 2 defend u, then u is the maxmin of 
Too. The minmax v is defined in a dual way and Too has a value if v = v. 

These definitions can be extended to the vector payoff framework: we 
say that Player 1 can guarantee a set E if he can approach E: 

> 0,3a £ ,3N e N,E CTe>T [tfe (p n )] < e,Vr,Vn > AT. 

In the counterexample of the proof of Proposition 11,91 Player 1 cannot 
guarantee the convex set C = {0} and Player 2 cannot defend it since: 

3a, We > 0, Vr, ViV e N, 3n > N, E CT , r [d c (p n )) < e. 

To keep the notations of zero-sum repeated game, one could say that the 
game we constructed has no maxmin. 

Blackwell [4] also gave an example of a game (with vector payoff) without 
maxmin in the full monitoring case. The main differences between the two 
examples are: 

i) in the partial monitoring case this set can be convex (which cannot 
occur in the full monitoring framework); 

ii) the strategy of Player 1 is such that the average payoff is infinitely 
often close to C. However, unlike Blackwell's example, he does not 
know at which stages. 
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4 Repeated game with incomplete information on 
one side, with partial monitoring 

Aumann fa Maschler [2] introduced the class of two-person zero-sum games 
with incomplete information on one side. Those games are described as 
follows: Nature chooses ko from a finite set of states K according to some 
known probability p G A(K). Player 1 (the maximizer) is informed about ko 
but not Player 2. At stage m G N, Player 1 (resp. Player 2) chooses i m G I 
(resp. j m G J) and the payoff is = p ko (i m ,j m )- Player 1 observes j m 
and Player 2 does not observe i m nor p m but receives a signal s m whose 
law is s k °(i m ,j m ) G A(S). As in the previous sections, we define s k (x) = 
{ sk ( x J)) jeK > for ever Y x G A(J). 

A strategy a (resp. r) of Player 1 (resp. Player 2) is a mapping from 
K x U eN (IxJx S) m to A (J) (resp. from U meN ( J x S )™ to A ( J ))- At 
stage m + 1, a(k, h m ) is the law of i m +i after the history h m if the chosen 
state is k. 

We define Ti the one-shot game with expected payoff J2k£KP k P k ( xk > V) 
and T OQ (p) the infinitely repeated game. We denote by ^oo(p) its value, if 
it exists (i.e. if both Player 1 and Player 2 can guarantee it). Aumann 
fa Maschler [2] (Theorem C, p. 191) proved that T^p) has a value and 
characterized it. 

Let us first introduce the operator Cav and the non-revealing game 
D(p): for any function / from A(J) x A(J) to HL, Cav(/)(-) is the smallest 
(pointwise) concave function greater than /. 

A profile of mixed actions x = (x k )k^K £ A (^) A ' is non-revealing at 
p G A(-fT) (and induces the flag p G A(S) J ) if the flag induced by x is 
independent of the state: 

NR(p,n) = {x = (x\...,x K ) G A(I) K \s k (x k ) = fx,Vk st p k > o} . 

We denote by NR(p) = U^gA(5) j NR(p, /u) the set of non-revealing strate- 
gies. For every \i G A(S) J , D(p,fi) (resp. D(p)) is the one-stage game Ti 
where Player 1 is restricted to NR(p,fi) (resp. NR(p)) and its value is de- 
noted by u(p,fi) (resp. u(p)), with u(p,/i) = — oo if NR(p,fi) = (resp. 
u(p) = -oo if NR(p) = 0). 

Theorem 4.1 (Aumann & Maschler |2j) The game has a value de- 
fined by v OCJ (p) = Cav(n)(p). 
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Proof. Player 1 can guarantee u(p): indeed if NR(p) ^ 0, he just has to play 
i.i.d. an optimal strategy in NR(p) and otherwise u(p) = — oo. Therefore, 
using the splitting procedure (see Lemma 5.2 p. 25 in [2]), Player 1 can 
guarantee Cav(u)(p). 

It remains to show that Player 2 can also guarantee Cav(u)(p). The 
function Cav(u)(-) is concave and continuous, therefore there exists m = 
(m 1 , . . . , m k ) £ M. K such that Cav(ii)(p) = (m,p) and u(q) < Cav(u)(g) < 
(m,q). Instead of constructing a strategy of Player 2 that minimizes the 
expected payoff Ylk£KP k ~Pni it is enough to construct a strategy such that 
each ~p\ is smaller than m fc , for every state k that has a positive probability 
accordingly to Player 2's posterior. 

Therefore, we consider an auxiliary two-person repeated game with vec- 
tor payoff where at stage n £ N, Player 2 (resp. Player 1) chooses j n accord- 
ingly to y n £ A(J) (resp. (4, • • • , i%) accordingly to (a£, . . . , x%) £ A(/) x ). 
Player 2 receives a signal s n whose law is s °(i k °,j n ) where fen is the true 
state. We denote by p n = s k °(x k °) the expected flag of stage n. The fc-th 
component of the vector payoff p n is defined by p k (i k ,j n ) if Un belongs to 
S k , the range of s k and —A := - maxj, e x ll/^lloo otherwise^]. Conversely, 
the set of compatible payoffs given a flag p £ A(S) J , y £ A (J) and a state 
k, is defined by: 



and the set of compatible vector payoffs is P(p,y) = HkeKP k {p,y) C K . 

If Player 2 can approach M = {m £ R^; m fc < m fe , V/c £ .FT} = m + M5, 
then he can guarantee Cav(u)(p). Theorem 11.81 implies that the convex set 
M is approachable if and only if, for every p £ A(S) 1 there exists y £ A(J) 
such that P(p, y) C M. 

Hence it is enough to prove that this property holds. Assume the con- 
verse: there exists po £ A(S) 1 such that for every y £ A(J), P(po,y) is not 
included in M. 

We denote by K(pq) = {k £ K; po <E S k } the set of states that are com- 
patible with pq: if Player 2 observes po, then he knows that the true state is 
in K(p ). For every y £ A(J) and k £ K(p ), u§(y) = sup s k( x k )=IM0 p k (x k ,y) 
is the worst payoff for Player 2 in state k. The fact that P(po,y) is not 
included in M implies that uo(y) = (^o (y))fceifOo) does not belong to 

x We use this notation, because if fi n is not in the range of s k , then Player 2 knows that 
the true state is not k, and therefore does not need to minimize the fc-th component of 
the payoff vector 
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Mq = {m € R K ^o)- m k < m k ^\/k £ K(n )}. Define the convex set: 



Wo = {My);y e A( J)} + < (w) f| B(o, a), 

with B(0, A) the closed ball of radius A. Obviously W n M = and, 
by linearity of each Wq is a compact convex set. So there exists a 
strongly separating hyperplane Hq = {u> G R-^ ); (cj, g ) = b} such that 
sup mgMo (m, go) < mi weVK ( w ) 9o)- Every component of go must be non- 
negative (since Mq is negatively comprehensive), therefore up to a normal- 
ization, we can assume that go belongs to A(K(fio)). 

Define W = W x R^W and g G A(K) by g(fc) = go(fc) if jfe € K{jiq) 
and otherwise. Then, H = {uj G R^; (w, g) = 6} strongly separates M and 
W, therefore: 

(m,g) < min (u,q) = min max V q k p k (x k ,y) = u(q,fi ) < u(q) 

u€W y6A(J) xeNR(q,no) f~t 

feSA 

and by definition of m, u(q) < (m, g) which is impossible. 

So M is approachable by Player 2, he can guarantee Cav(u)(p) in T OQ (p) 
and Voo(p) = Cav(n)(p). □ 
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